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Abstract. We investigate the spatial clustering of X-ray selected sources in the two deepest X-ray fields to date, namely 
the 2Msec Chandra Deep Field North (CDFN) and the lMsec Chandra Deep Field South (CDFS). The projected correlation 
function w(r p ), measured on scales ~ 0.2 - 10 ft -1 Mpc for a sample of 240 sources with spectroscopic redshift in the CDFN 
and 124 sources in the CDFS at a median redshift of z ~ 0.8, is used to constrain the amplitude and slope of the real space 
correlation function £(r) = (r/r a y y . The clustering signal is detected at high confidence (> la) in both fields. The amplitude 
of the correlation is found to be significantly different in the two fields, the correlation length ro being 8.6 ±1.2 h~ x Mpc in the 
CDFS and 4.2 ± 0.4 hr 1 Mpc in the CDFN, while the correlation slope y is found to be flat in both fields: y = 1.33 ± 0.1 1 in the 
CDFS and y = 1.42 ± 0.07 in the CDFN (a flat Universe with Cl„, = 0.3 and Q. A = 0.7 is assumed; ltr Poisson error estimates 
are considered). The correlation function has been also measured separately for sources classified as AGN or galaxies. In both 
fields AGN have a median redshift of z ~ 0.9 and a median 0.5-10 keV luminosity of L x ~ 10 43 erg s -1 , i.e. they are generally 
in the Seyfert luminosity regime. As in the case of the total samples, we found a significant difference in the AGN clustering 
amplitude between the two fields, the best fit correlation parameters being ro = 10.3 ± 1.7 h~ l Mpc, y = 1.33 ± 0.14 in the 
CDFS, and r a = 5.5 ± 0.6 h~ l Mpc, y = 1.50 ± 0.12 in the CDFN. In the CDFN, where the statistics is sufficiently high, we 
were also able to measure the clustering of X-ray selected galaxies, finding r = 4.0 ± 0.7 h Mpc and y = 1.36 ± 0.15. Within 
each field no statistically significant difference is found between soft and hard X-ray selected sources or between type 1 and 
type 2 AGN. After having discussed and ruled out the possibility that the observed variance in the clustering amplitude be due 
to observational biases, we verified that the extra correlation signal in the CDFS is primarily due to the two prominent redshift 
spikes at z ~ 0.7 reported by Gilli et al. 1 2003 ). The high (5 - 10 fT 1 Mpc) correlation length measured for the X-ray selected 
AGN at z ~ 1 in the two Chandra Msec fields is comparable to that of early type galaxies at the same redshift. This is consistent 
with the idea that, at z ~ 1, AGN with Seyfert-like luminosities might be generally hosted by massive galaxies. 

Key words. Surveys - Galaxies: active - X-rays: general - Cosmology: large-scale structure of Universe 



1. Introduction 

Active Galactic Nuclei (AGN) represent one of the best tools to 
study the large scale structure of the Universe at intermediate- 
high redshifts, z ~ 1-2, i.e. at an epoch of intense structure 
formation where matter was undergoing the transition from the 
initially smooth state observed at the recombination (z ~ 1000) 
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to the clumpy distribution observed at present time (see e.g. 
Hartwick & Schade 1990). 

One of the most commonly used statistics to measure the 
clustering of a population of sources is the two-point correla- 
tion function %(r), which measures the excess probability of 
finding a pair of objects at a separation r with respect to a ran- 
dom distribution and is usually approximated by a power law 
£(r) = (r/ro) _r . Under simple assumptions, the amplitude of 
the AGN correlation function can be used to estimate the typi- 
cal mass of the dark matter halos in which AGN reside (Grazian 
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et al. 120041 Magliocchetti et al.l2U04 1 and the typical AGN life- 
times (Martini & Weinberg 2001 1. 

The first attempts to measure AGN clustering date more 
than 20 years ago (Osmer 1981). Since then AGN cluster- 
ing has been extensively studied and detected by means of 
optical surveys encompassing an increasing number of QSOs 
(Shanks et al. lT987l La Franca et al. lT998l Croom et al. l200T1 
Grazian et al. 120041 Recently, the 2dF QSO Redshift Survey 
(2QZ, Croom et al. 2001 1 has provided the tightest constraints 
to QSO clustering, based on a sample of more than 10 4 ob- 
jects: the QSO correlation length and slope were found to be 
r = 5.7 + 0.5 /T 1 Mpc and y = 1.56 + 0.10 at a median redshift 
of z — 1.5 and on comoving scales of 1 - 60 h Mpc. This 
result confirmed previous measurements and showed that QSO 
clustering at z = 1.5 is comparable to that of local (z ~ 0.05) 
optically selected galaxies (Tucker et al. [ 1997 Ratcliffe et al. 
1998 1. In addition, thanks to the large number of QSOs in their 
sample, Croom et al. l|2001) were also able to investigate the 
evolution of QSO clustering with redshift, finding a marginal 
increase by a factor of 1 .4 in the ro value from z ~ 0.7 to z ~ 2.4 
for a flat cosmology with Q m = 0.3 and Qa = 0.7. 

Although optical surveys provide the largest AGN sam- 
ples so far, they include almost exclusively unobscured-type 1 
objects, since AGN candidates are mainly selected by means 
of UV excess techniques. Obscured-type 2 AGN might in- 
stead be efficiently selected by means of mid- and far-infrared 
surveys, since the nuclear UV radiation absorbed by the ob- 
scuring medium is expected to be re-emitted at longer wave- 
lengths. Georgantopoulos & Shanks ( 1994) analyzed the clus- 
tering properties of a sample of ~ 200 local Seyfert galaxies 
(z < 0.1) observed with IRAS and selected through their warm 
infrared colors. By comparing the observed number of indepen- 
dent pairs with that expected from a random sample distributed 
over the same scales, they measured a ~ 3cr clustering signal 
for the total sample, finding marginal evidence that Seyfert 2 
galaxies are more clustered than Seyfert Is. 

Perhaps the most efficient way to sample the obscured AGN 
population is through X-ray observations, especially in the hard 
band, where the nuclear radiation is less affected by absorp- 
tion. Based on population synthesis models for the X-ray back- 
ground (e.g. Comastri et al. fT995l Gilli et al. l200Tl Ueda et al. 
2003 1, obscured AGN are believed to be a factor of > 4 more 
abundant than unobscured ones and should therefore dominate 
the whole AGN population. Spatial clustering of X-ray selected 
AGN has been limited so far by the lack of sizable samples of 
optically identified X-ray sources. Boyle & Mo ( 1993 ( studied 
the AGN at z < 0.2 in the Einstein Medium Sensitivity Survey 
(EMSS, Stocke et al. 1991), without finding any positive clus- 
tering signal. Carrera et al. ( 1998) considered the AGN in the 
ROSAT International X-ray Optical Survey (RIXOS, Mason 
et al. 2000) and in the Deep ROSAT Survey (DRS, Boyle et 
al. [1994 1, detecting only a weak (~ 2cr) clustering signal on 
scales < 40 - 80 ft -1 Mpc for the RIXOS AGN subsample 
in the redshift range z = 0.5 - 1.0. Significant clustering sig- 
nal was instead detected from angular correlations by several 
Authors: Akylas et al. (2000), based on the ROSAT All Sky 
Survey (RASS, Voges et al. 1999); Vikhlinin & Forman JT995b 
from a compilation of ROSAT PSPC deep pointings, and fi- 



nally Giacconi et al. (2001) from the first 130 ksec observation 
of the Chandra Deep Field South (Rosati et al. 2002). Very 
recently Yang et al. J2003 ) have claimed that hard X-ray se- 
lected sources have an angular clustering amplitude ten times 
higher than that of soft X-ray selected sources. A high angular 
clustering amplitude for hard X-ray selected sources, consistent 
with that measured by Yang et al. (2003), has been also mea- 
sured by Basilakos et al. d2004> . In some cases (e.g. Vikhlinin 
& Forman 1995; Akylas et al. 2000; Basilakos et al. 2004) the 
angular clustering was converted to spatial clustering by means 
of the Limber's equation, where an a priori redshift distribu- 
tion has to be assumed. Unfortunately, because of the several 
uncertainties in its assumptions, this method has not provided 
stringent results: Akylas et al. (2000) found ro = 5 — 8 Mpc, 
Vikhlinin & Forman ( 1995) ro > 5 hr x Mpc and Basilakos et 
al. <2004i r >9h l Mpc. 

To date, the only direct measurement of spatial clustering 
of X-ray selected AGN has been obtained from the ROSAT 
North Ecliptic Pole survey data (NEP, Gioia et al. 2003). From 
a sample of 219 soft X-ray selected AGN, Mullis et al. (|2004i 
measured a correlation length of ro = 7.4+J? hr l Mpc with y 
fixed to 1.8. The median redshift of the NEP AGN contribut- 
ing to the clustering signal is z ~ 0.2 (see also Mullis 200l]for 
a preliminary version of that work). Because of the relatively 
short exposures in the NEP survey and the limited ROSAT sen- 
sitivity, only bright sources, with a surface density of the order 
of 3 deg~ 2 , were detected in this sample. In deeper samples, 
where the source surface density is higher, the clustering sig- 
nal should be detected more easily since the spatial correlation 
function is a power law increasing at lower pair separations. In 
particular, deep pencil beam surveys are expected to provide 
the highest signal significance with the minimum number of 
identified objects. 

The Chandra Msec surveys in the Deep Field South 
(CDFS, Rosati et al. 2002) and North (CDFN, Alexander et 
al. 2003) are in this respect the ideal fields to look at, with an 
X-ray source surface density of the order of 3000-4000 deg -2 . 
The drawbacks are that these strong signals expected on small 
areas may be subject to substantial variance, well beyond the 
one implied by Poisson statistics (see Daddi et al. 2001 for a 
discussion of this effect in the case of angular clustering), so 
that the "real" amplitude of the correlation function would need 
a large set of measurements in independent fields to be reli- 
ably estimated. In addition, optical spectroscopy is challenging 
for a significant fraction of these X-ray sources with faint opti- 
cal magnitude counterparts. We will address these points in the 
rest of the paper. A large spectroscopic identification program 
down to faint magnitudes (R < 25.5) is underway in the CDFS 
(Szokoly et al. I2U041 and in the CDFN (Barger et al. l2"UU3l . 
To date, about 40-50% of the X-ray samples have been spec- 
troscopically identified, revealing that, even at very low fluxes, 
AGN are still the most numerous sources populating the X-ray 
sky. Here we will take advantage of the spectroscopically iden- 
tified sources in the CDFS and CDFN to measure and compare 
the spatial clustering of X-ray selected AGN in the two fields. 

The paper is organized as follows. In Section 2 we sum- 
marize the X-ray and optical observations of the CDFS and 
CDFN and present the source catalogs used in our analysis. In 
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Fig. 1. X-ray flux distribution for the total, AGN and galaxy 
sample observed in the 2Msec CDFN (upper panel) and lMsec 
CDFS (lower panel). Only sources with robust spectroscopic 
redshift are considered. The source classification is based on 
the hardness ratio vs luminosity diagram described in Section 3 
and shown in Fig.|5]and|6] 



Fig. 2. R magnitude distribution for the total, AGN and galaxy 
sample observed in the 2Msec CDFN (upper panel) and lMsec 
CDFS (lower panel). Only sources with robust spectroscopic 
redshift are considered. The source classification is based on 
the hardness ratio vs luminosity diagram described in Section 3 
and shown in Fig.|3]and|6] 



Section 3 we describe the classification scheme adopted to di- 
vide sources into AGN or galaxies. In Section 4 we describe 
the methods used to estimate the projected correlation function 
of X-ray selected sources as well as the obtained results, which 
are then discussed in Section 5. Conclusions and prospects for 
future work are finally presented in Section 6. 

Throughout this paper we will use a flat cosmology with 
Q m = 0.3 and Q\ = 0.7. Unless otherwise stated, we will al- 
ways refer to comoving distances in units of h Mpc, where 
Hq = 100 h km s Mpc -1 . Luminosities are calculated using 
h = 0.7. 

2. X-ray and optical data 

2.1. CDFS 

The CDFS has been observed with 1 1 ACIS-I pointings for a 
total 1 Msec exposure (Rosati et al. 2002 1. X-ray sources have 
been detected down to limiting fluxes of 5.5 10~ 17 erg cm -2 
s _1 (hereafter cgs) and 4.5 10~ 16 cgs in the soft (0.5-2 keV) 
and hard (2-10 keV) band, respectively. Overall, 307 sources 
have been detected in the soft band and 251 sources in the 
hard band for a total sample of 346 sources distributed over 
the whole 0.1 deg 2 field. The full X -ray catalog and the de- 
tails of the detection process have been presented by Giacconi 
et al. (2002 1. The optical follow-up photometry was primarily 
performed using the FORS 1 camera at the VLT (Szokoly et al. 
2004i. The combined R band data cover a 13.6 x 13.6 arcmin 
field to limiting magnitudes between 26 and 26.7. In the area 



not covered by FORS mosaics, we used shallower data from 
the ESO Imaging Survey (EIS, Arnouts et al. 2001). The op- 
tical identification process is described in Tozzi et al. (|2001l 
and Giacconi et al. (2002i. Optical spectroscopy for most of 
the X-ray counterparts with R< 24 has been obtained with 
FORS1 during several observational runs at the VLT. About 
~ 20 spectra of optically faint sources with 24 < R < 26 were 
also collected. The details of the spectroscopic data reduction 
and analysis are presented in Szokoly et al. (2004i. So far 169 
redshifts have been obtained. Quality flags have been assigned 
to the spectra, according to their reliability. Here we consider 
only the 127 X-ray point-like sources (excluding stars) with 
spectral quality flag Q > 2, where two or more lines have been 
observed in the spectrum of the optical counterpart and the red- 
shift determination is unambiguous. The X-ray flux and R band 
magnitude distribution for these sources are shown in the lower 
panels of Fig^and Fig- El respectively. We estimated the red- 
shift accuracy by considering the ~ 40 sources with at least two 
independent redshift measurements, both with Q > 2, obtained 
in different observing runs (see Table 5 of Szokoly et al. 2004). 
The distribution of the redshift differences has a relatively large 
dispersion of cr(Az) ~ 0.005. When removing two outliers with 
a 3cr clipping technique (both outliers are Broad Lines AGN 
for which a precise redshift determination is more difficult), the 
observed dispersion decreases to cr(A z) ~ 0.003, correspond- 
ing to an average uncertainty in a single redshift measurement 
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Fig. 3. Redshift distribution for point-like X-ray sources in 
the CDFS in bins of Az = 0.02. Only sources with robust 
spectroscopic redshift have been considered. The solid curve 
shows the selection function obtained by smoothing the ob- 
served redshift distribution. The inset shows the redshift dis- 
tribution of CDFS sources as a function of their classification 
(see Section 3). 

of A z ~ 0.003/ V2 ~ 0.002. 1 As shown in Fig. the red- 
shift distribution is dominated by two large concentrations of 
sources at z=0.67 and z=0.73, while other smaller peaks are 
also visible (see also Gilli et al. 2003), already demonstrating 
that X-ray sources in the CDFS are highly clustered. The final 
spectroscopic completeness is ~ 35%. This fraction increases 
to 78% for the subsample of X-ray sources with optical coun- 
terparts brighter than R=24. We stress that in our measurements 
it is essential to consider only sources with small redshift er- 
rors, otherwise the clustering signal in redshift space would be 
removed. The typical measurement errors in the photometric 
redshifts of CDFS sources (Zheng et al. 2004 1 are of the order 
of Az ~ 0.14, corresponding to ~ 270/z -1 Mpc comoving at the 
median CDFS redshift of 0.7. The above redshift uncertainty 
would significantly dilute the clustering signal in the consid- 
ered field (which is dominated by redshift clustering) and there- 
fore photometric redshifts cannot be used for our purposes. 

2.2. CDFN 



The Chandra Deep Field North (CDFN, Alexander et al. 2003 
Barger et al. 2003 1, which is centered on the Hubble Deep Field 
North (Williams et al.[T996l, is the analog of the CDFS in the 
Northern hemisphere. The CDFN has been observed with 20 



Fig. 4. Same as Fig.[3]but for CDFN sources. 



ACIS-I pointings for a total 2 Msec exposure. Limiting fluxes 
of ~ 2.5 10~ 17 cgs and ~ 1.4 10~ 16 cgs have been reached in 
the soft and hard band, respectively. A total sample of 503 X- 
ray sources (451 of which are detected in the soft band and 332 
in the hard band) has been collected over an area of 0.13 deg 2 . 



1 We note that the value of 0.005 quoted by Szokoly et al. 1 2004 1 as 
the typical uncertainty in the redshift determination is a conservative 
~ 3cr boundary. 



The full X-ray catalog is found in Alexander et al. (2003 1 and 
the details of the optical identification program have been pub- 
lished by Barger et al. (IIUOH . The LRIS and DEIMOS instru- 
ments at the Keck telescope were primarily used for the opti- 
cal follow-up of the X-ray sources. A few additional identifica- 
tions were added by cross correlating the X-ray with the optical 
catalog of the Caltech Faint Galaxy Redshift Survey (Cohen 
et al. 2000) which covers the inner 50 arcmin 2 of the CDFN 
and has a spectroscopic completeness of about 90% down to 
R=24 in the Hubble Deep Field and to R=23 in the surround- 
ing flanking fields. Most of the redshifts in the Barger et al. 
(2003) catalog have been obtained from spectra with multiple 
lines, and should be therefore comparable to the Q > 2 red- 
shifts of the CDFS catalog. We ignored the 13 CDFN sources 
for which the redshift estimate is not based on two or more 
emission/absorption lines (see Barger et al. 2003). No addi- 
tional high quality redshifts were obtained by cross-correlating 
the X-ray catalog of Alexander et al. ( 2003 ) with the two re- 
cently published spectroscopic catalogs of the ACS-GOODS 
survey in the CDFN (Cowie et al. 120041 Wirth et al. 120041 . 
The final considered catalog includes 252 sources, correspond- 
ing to a spectroscopic completeness of ~ 50%. We estimated 
the typical redshift errors (not quoted in Barger et al. 2003) 
by comparing the common redshifts with high quality in the 
catalogs of Barger et al. ( 120031 and Cohen et al. (2000). We 
found that the measurements in the two catalogs are in very 
good agreement, with essentially zero offset and a dispersion 
of cr(Az) < 0.002, indicating that the redshift accuracy in each 
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Fig. 5. The "classification diagram", i.e. hardness-ratio vs. ob- 
served 0.5-10 keV luminosity, for the CDFS sources. 



catalog is better than this value. The redshift distribution for the 
considered spectroscopic sample is shown in Fig.0] As in the 
case of the CDFS redshift distribution, several redshift spikes 
can be immediately identified, the most prominent of which at 
z ~ 0.85 and z ~ 1 .02 (see Barger et al. 2003). 

Although the general shape of the CDFN redshift distribu- 
tion peaks at z ~ 0.7 - 0.8, similarly to that observed in the 
CDFS (see e.g. the smoothed curves in Fig. and 0}, a few 
differences can be noticed between the two. One obvious ef- 
fect is produced by the several spikes which trace structures at 
different redshifts. More interestingly, the fraction of low red- 
shift sources is higher in the CDFN than in the CDFS. As an 
example, 28% of CDFN sources lay at z < 0.5, while the corre- 
sponding fraction in the CDFS is 17%. This difference can be 
readily explained by the deeper CDFN exposure, which is able 
to pick up the faint X-ray emission of nearby normal and star- 
burst galaxies (see the next Section and the insets of Fig.[3]and 
0}. The X-ray flux and R band magnitude distributions for the 
CDFN sources with good redshift estimate considered in this 
paper are shown in the upper panel of Fig.QJand Fig.|2j respec- 
tively. Due to the higher exposure time, in the CDFN the source 
flux distribution has a larger fraction of objects at faint fluxes 
(/o.5-iOfeV 55 10 cgs) with respect to that observed in the 
CDFS 2 . As mentioned above, most of these faint sources are 



2 It is worth noting that the 0.5-10 keV flux of the faintest CDFS 
sources (/b.5-iofev ~ 10~ 16 cgs) is likely to be underestimated in Fig.Q 
Indeed, since in the CDFS no X-ray photometry was performed in the 
total X-ray band, the 0.5-10 keV flux is obtained by simply summing 
the flux in the soft and in the hard band. Therefore, for sources de- 
tected in the soft band only (most of which are at the faintest fluxes), 
the 0.5-10 keV flux simply corresponds to the 0.5-2 keV flux, and 
some residual flux above 2 keV is lost. On the contrary, in the CDFN 
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Fig. 6. Same as Fig.|5]but for CDFN sources. 



classified as galaxies. The R-band magnitude distributions are 
instead more similar, with most of the spectroscopically con- 
firmed sources in the range 19 < R < 24 in both samples, con- 
firming that the spectroscopic observations have been equally 
deep in both fields. 

3. Source classification 

In order to measure the clustering properties of different pop- 
ulations, we classified our sources following the scheme pre- 
sented by Szokoly et al. (2004) for CDFS sources, where X- 
rays are the main tool to infer informations on the physical 
nature of each object. We somewhat simplified that scheme by 
avoiding the luminosity distinction between type-2 AGN/QSOs 
and between type-1 AGN/QSOs. Our adopted classification 
scheme can be then summarized as follows: 

type-1 AGN : HR < -0.2 and logL .5-io > 42 

type-2 AGN : HR > -0.2 

galaxy : HR < -0.2 and logLo.5-io < 42, 

where HR — (H - S)/(H + S) is the X-ray hardness ratio, i.e. 
the difference between the hard (H) and soft (S ) band counts 
normalized to the total counts, and Lo.5-10 is the observed 0.5- 
10 keV luminosity in units of erg s _1 . 

The cut at HR = -0.2 between type-1 and type-2 AGN is 
motivated by the fact that most of the AGN with broad optical 
lines (31/32) lay below this limit, while the majority of narrow 
line AGN (16/21) are found above it. The adopted classifica- 
tion scheme is admittedly crude, but it can be considered a rea- 
sonable approach when dealing with sources with faint optical 
spectra, for which detailed line diagnostics is difficult. 



the X-ray photometry has been performed also in the total 0.5-10 keV 
band even for sources not detected in the 2-10 keV band, whose total 
flux is then always higher than the soft flux. 
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Fig. 7. Distribution on the sky of CDFS sources with robust 
redshift measurements. Different source classes are represented 
with different symbols as labeled. The box indicates the 6.7 x 
4.8 arcmin region covered by the K20 survey (Cimatti et al. 
2002 1. The dashed circle of 8 arcmin radius is the region with 
higher (~ 50%) spectroscopic completeness. 

To keep a uniform classification criterion in the two field, 
we applied the above scheme also to CDFN sources (see also 
Hasinger 2003). As a consistency check, we computed the X- 
ray hardness ratio for CDFN sources based on the soft and hard 
counts presented in the Alexander et al. 12003) catalog, and 
verified that also for this sample objects with broad optical lines 
have HR < -0.2 as in the CDFS. 

The adopted "classification diagram", i.e. the hardness ratio 
vs. X-ray luminosity plot, is shown in Fig.|5]and|6]for the CDFS 
and CDFN sources, respectively, and the classification break- 
down is shown in Table 1 (only sources with Lo.5-10 > 10 40 
erg s _1 are considered, see Section 4.2). We point out that the 
significantly higher fraction of galaxies found in the 2Msec 
CDFN with respect to the lMsec CDFS is due to the twice 
longer exposure of the CDFN. As shown in Fig. [6] the line at 
logLo.5-10 = 42 appears to sharply divide a smooth source dis- 
tribution into two distinct classes (galaxies and type-1 AGN). 
It is therefore likely that each class contain some misclassified 
objects. Indeed, part of the soft sources with logLo.5-10 < 42 
might harbour a low luminosity AGN and, on the other hand, 
galaxies with intense star formation might have X-ray lumi- 
nosities exceeding logLo.5-10 = 42. Nonetheless, the fraction 
of misclassified objects should be of the order of a few percent 
in each class and therefore we do not expect any significant 
impact on our clustering measurements. 

The spatial distributions of the X-ray sources in the CDFS 
and CDFN as a function of their spectroscopic classification 
are shown in Fig [7] and [8] respectively. As it is evident in 



Fig. 8. Same as Fig.0but for CDFN sources. Symbols are as in 
the previous Figure. The 4 arcmin radius circle approximately 
shows the area covered by the Hubble Deep and Flanking fields 
(Cohen et al. l2000b . 

Table 1. Source classification breakdown. Only sources with 
^0.5-10 > 10 40 erg s _1 are considered. 



Sample 


Type 1 


Type 2 


Gal 


CDFS 


45 


52 


27 


2Msec CDFN 


89 


71 


80 


lMsec CDFN 


79 


60 


37 



Fig|H] most of the CDFN galaxies are found in the center of the 
field, where the X-ray sensitivity is highest. When applying the 
above classification scheme to the 189 CDFN sources with ro- 
bust redshift measurement detected in the first lMsec exposure 
(Brandt et al. 2001 Barger et al. 12002b . we found that, while 
the number of AGN drops by ~ 15%, the number of galaxies 
drops by more than a factor of ~ 2, i.e. from 80 to 37. Then, 
when accounting for the different spectroscopic completeness, 
the number of galaxies found in the lMsec CDFN is in agree- 
ment with that found in the CDFS. We also caution the reader 
that the ratio between type-2 and type-1 AGN one might de- 
rive from Table 1 is a lower limit rather than the real ratio in 
these deep X-ray fields: first of all, the optical identifications 
are largely incomplete and the fraction of type-2 AGN is ex- 
pected to be higher among unidentified sources, which are on 
average harder than those already identified (we indeed verified 
that in both fields the type-2/type-l ratio increases towards faint 
R magnitudes); second, the number of obscured sources mis- 
classified as type-1, as defined on the basis of the here adopted 
classification, is likely to be higher than the number of unob- 
scured sources misclassified as type-2. This can be seen for 
example in Fig. 9 of Tozzi et al. (2001 1 where it is shown how 
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the observed hardness ratio decreases with redshift for a given 
value of the obscuring column density Nh- In Gilli et al. ( 2003 ) 
we classified an X-ray source as AGN with slightly different 
criteria from those adopted here. In particular, we considered 
to be AGN those sources satisfying at least one of the follow- 
ing conditions: I0.5-10 > 10 42 erg s~\ HR > 0, /*//« > 0.1, 
where Lo.5-10 is the observed 0.5-10 keV luminosity and f x /fn 
is the ratio between the 0.5-10 keV flux and the R band flux 
(see Section 4.1 of Gilli et al. 2003 for details). We verified 
that the two classification criteria provide very similar results. 
Indeed, ~ 97% of the sources classified as AGN by one method 
are also classified as AGN by the other. 

4. The spatial correlation function 

4.1. Analysis techniques 

The most widely used statistics to measure the clustering prop- 
erties of a source population is the two point correlation func- 
tion £(r), defined as the excess probability of finding a pair with 
one object in the volume dV\ and the other in the volume 1JV2, 
separated by a comoving distance r (Peebles 1980): 



dP = n 2 [\ + £(r)]dVidV 2 



(1) 



A related quantity, which is what we actually measure in 
this paper, is the so-called projected correlation function: 



/>V0 



%(r p ,r v )dr v , 



(2) 



where £(r p ,r v ) is the two point correlation function ex- 
pressed in terms of the separations perpendicular (r p ) and par- 
allel (r v ) to the line of sight as defined in Davis & Peebles 
( 1983 1 and applied to comoving coordinates. The advantage of 
using the integral quantity w{r p ) rather than directly estimating 
the two point correlation function in redshift space %(s) is that 
w(r p ) is not sensitive to distortions introduced on small scales 
by peculiar velocities and errors on redshift measurements. 

If the real space correlation function can be approximated 
by a powerlaw of the form %(r) = (r/ro)~ r and r„o = °° then the 
following relation holds (Peebles 1980 1: 



.7,1-7 



w(r p ) = A{y)r 7 r p 



(3) 



where A(y) = F(l/2)r[(y - l)/2]/r(y/2) and T(x) is the 
Euler's Gamma function. A(y) increases from 3.68 when y - 
1.8 to 7.96 wheny = 1.3. 

A practical integration limit r,,o has to be chosen in Eq. 2 
in order to maximize the correlation signal. Indeed, one should 
avoid too large r,« values which would mainly add noise to the 
estimate of w{r p ). On the other hand too small scales, com- 
parable with the redshift uncertainties and with the pairwise 
velocity dispersions, (i.e. the dispersion in the distribution of 
the relative velocities of source pairs), should also be avoided 
since they would not allow to recover the whole signal. A red- 
shift uncertainty of Az < 0.002 (the typical value observed in 
our samples) corresponds to comoving scales below 6.7 hr x 
Mpc at all redshifts. The average velocity dispersion measured 
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Fig. 9. Measured correlation length 7*0 and slope y as a function 
of r„o, i.e. the integration limit on w(r p ) (see Eq.3), for the total 
samples in the CDFN (filled circles) and CDFS (open circles). 
We choose rvo = 10 h~ l Mpc as our integration radius. For 
lower r V Q values the correlation signal is not fully recovered, 
while for higher values the noise increases. 



by Cohen et al. ypOO I for the redshift spikes observed in the 
Hubble Deep and Flanking fields is of the order of 400 km 
corresponding to Az ~ 0.002 at z ~ 0.7. At these redshifts 
the pairwise velocity dispersion should be of the same order. 
Indeed, the value measured in the local Universe (500 - 600 
km Marzke et al. 1995 Zehavi et al. 2002 1 is expected 
to decrease by ~ 15% at a redshift of 0.7 (see e.g. the ACDM 
simulations by Kauffmann et al. 1999 1. We further checked that 
the velocity dispersion measured for the redshift structures of 
X-ray sources in the CDFS and CDFN corresponds typically to 
< 10 hT x Mpc. To search for the best integration radius r V Q we 
measured w(r p ) for the CDFS and CDFN total samples for dif- 
ferent r,,o values ranging from 3 to 100 h~ l Mpc. The obtained 
correlation length and slope as a function of r,,o are shown in 
Fig. [9] We note that ro decreases for r,o values smaller than 
10 A 1 Mpc, showing that the signal is not fully recovered. For 
r v o values greater than 10 hr l Mpc ro does not vary signifi- 
cantly, but the errorbars are higher. This behaviour, which is 
more evident for the CDFS sample, is similar to that observed 



by Carlberg et al. (|2000l for the galaxies in the CNOC2 sample 
(Yee et al. 2000 l. The slope of the correlation is rather constant 
over most of the r, o range. For the CDFN sample a steepening 
of y is observed at r v o = 50 - 90 h~ l Mpc. However, at these 
large radii, the errors are large and the measured slope is con- 
sistent within < 2cr with the value obtained for r,,o = 10 h~ l 
Mpc. We therefore consider the observed steepening as a fluc- 
tuation which is not statistically significant and in the following 
we will fix r,o to 10 h~ l Mpc. 
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To measure £(r p , r,,) we created random samples of sources 
in our fields and measured the excess of pairs at separations 
(r p , r v ) with respect to the random distribution. We used the 
minimum variance estimator proposed by Landy & Szalay 
(1993), which is found to have a nearly Poissonian variance: 

aDD(r p ,r v )-2bDR(r p ,r v )+RR(r p ,r v ) 

t(r p ,r v ) = — , (4) 

RR(r p , r v ) 

where DD, DR and RR are the number of data-data, data- 
random and random-random pairs at separations r p + Ar p and 
r,,+ Ar,,, a = n r (n r -l)/n c {(nd-l) and b = (n r — l)/2«d, where nj 
and n r are the total number of sources in the data and random 
sample, respectively. 

Both the redshift and the coordinate (a, 5) distributions of 
the identified sources are potentially affected by observational 
biases. In particular, the redshift distribution may be biased by 
the presence of a limiting magnitude beyond which spectro- 
scopic redshifts can not be obtained. The (a, 5) distribution, 
on the other hand, is affected by at least two biases: the X-ray 
bias, due to the non-uniform X-ray sensitivity limits over the 
field of view, and the spectroscopic bias, due to the position- 
ing of the masks within the field and of the slits within the 
masks. For this reason special care has to be taken in creating 
the sample of random sources. The redshifts of these sources 
were randomly extracted from a smoothed distribution of the 
observed one. This procedure should include in the redshift se- 
lection function the same biases affecting the observed distri- 
bution. We assumed a Gaussian smoothing length <r z - 0.3 as a 
good compromise between too small smoothing scales (which 
suffer from significant fluctuations due to the observed spikes) 
and too large scales (where on the contrary the source density 
of the smoothed distribution at a given redshift might be not a 
good estimate of the average observed value). We verified that 
our results do not change significantly when using a smoothing 
length in the range <r z = 0.2-0.4. The smoothed redshift distri- 
butions adopted for our simulations, shown in Fig. [5] and |4] for 
the CDFS and CDFN, respectively, have very similar shapes 
peaking at z ~ 0.7. We assumed that the clustering amplitude 
is constant with redshift and did not try to estimate clustering 
variations at different redshifts. Indeed, the clustering signal in 
a given redshift interval will strongly depend on small vari- 
ations in the choice of the interval boundaries, which might 
include or exclude prominent redshift spikes from the interval, 
hence producing extremely high fluctuations in the ro vs. z mea- 
surements. Since the X-ray sensitivity varies across the field of 
view, in particular with off-axis angle, we checked if there are 
significant differences in the redshift distribution of sources as 
a function of their off-axis angles. In particular we compared 
the distributions of sources inside and outside a given off-axis 
angle with a Kolmogorov-Smirnov (hereafter KS) test. We re- 
peated the KS test for several source subsamples (e.g. AGN, 
galaxies) in the CDFS and CDFN and for different off-axis an- 
gles. With the exception of the galaxies in the CDFN, for which 
the average redshift at off-axis angles below 4 arcmin is found 
to be significantly higher than that outside this region, we do 
not find any significant difference in the other subsamples. In 
the following we will then generate the redshift distribution for 



the random samples by simply smoothing the total distribution 
observed in each subsample. The case of CDFN galaxies will 
be discussed in detail in Section 4.2.2. 

The coordinates (a, 6) of the random sources were extracted 
from the coordinate ensemble of the real sample, thus repro- 
ducing on the random sample the same uneven distribution on 
the plane of the sky of the real sources (e.g. in both the CDFS 
and CDFN the X-ray sources were identified preferentially at 
the center of the field). This procedure, if anything, would di- 
lute the correlation signal, since it removes the effects of angu- 
lar clustering. We note however that we do not expect a strong 
signal from angular clustering in these deep pencil-beam sur- 
veys, where the radial coordinate spans a much broader dis- 
tance than the transverse coordinate and the clustering signal 
should be dominated by redshift clustering (see the tests with 
random coordinates in the next section). 

The source density adopted in the random samples is a fac- 
tor of 50-100 larger than that of the data sample depending on 
its size. More details on the chosen way to construct the ran- 
dom source sample, as well as several checks on its validity 
will be discussed in the next Section. 

We binned the source pairs in interval of Alogr^O.4 and 
measured w(r p ) in each bin. The resulting datapoints were then 
fitted by a power law of the form given in Eq. 3, and the best 
fit parameters y and ro were determined via x 2 minimization. 
Given the small number of pairs which fall into some bins (es- 
pecially at the smallest scales), we used the formulae of Gehrels 
(1986 1 to estimate the 84% confidence upper and lower lim- 
its, containing the 68% confidence interval (i.e. l<x errorbars in 
Gaussian statistics). It is well known that Poisson errorbars un- 
derestimate the uncertainties on the correlation function when 
source pairs are not independent, i.e. if the considered objects 
generally appear in more than one pair. In the samples consid- 
ered here, this is indeed the case at scales r p > 1 h Mpc. 
On the other hand, bootstrap resampling techniques (e.g. Mo, 
Jing & B6rner lT992> . which are often used to circumvent this 
problem, may substantially overestimate the real uncertainties. 
We tested bootstrap errors for our samples, finding that the un- 
certainties on the correlation function parameters increase by 
a factor of ~ 2 with respect to the Poissonian case. In the fol- 
lowing we will simply quote ro and y together with their lcr 
Poisson errors, bearing in mind that the most likely uncertainty 
lay between the quoted number and its double. 

4.2. Results 
4.2.1. CDFS 

We first considered the correlation function of all CDFS 
sources regardless of their classification. We excluded from 
the sample only stars and extended X-ray sources associated to 
galaxy groups/clusters. In addition we excluded from our cal- 
culations 3 low luminosity sources with Lo.5-10 < 10 40 erg s _1 , 
in which the X-ray emission might be due to a single off- 
nuclear Ultra Luminous X-ray source in the host galaxy (ULX, 
see e.g. Fabbiano 1989) rather than to the global star formation 
rate or to the active nucleus. We note that Hornschemeier et al. 
(2004) found 10 ULX candidates, all of them with Lo.5-10 $ 
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Fig. 10. Projected correlation functions for the total X-ray sam- 
ples in the CDFN (filled circles) and CDFS (open circles). 
Errors are lcr Poisson confidence intervals. The best fit power 
laws are shown as dashed lines. 
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Fig. 11. AGN projected correlation functions in the CDFN 
(filled circles) and CDFS (open circles). Errors are lcr Poisson 
confidence intervals. The best fit power laws are shown as 
dashed lines. 



10 40 erg s _1 , in the combined CDFS + CDFN sample covered 
by the GOODS survey. Although ULX likely do not represent 
the whole source population below 10 40 erg s _1 , we neverthe- 
less prefer to apply this luminosity cut since only a few sources 
are lost and the considered sample should be cleaner. Overall, 
we are left with a sample of 124 sources. 

The correlation function was measured in the redshift range 
z = - 4 (median redshift z ~ 0.7) and on scales r p 
0. 1 6 - 20 hT l Mpc. Here and in the following samples a 
power law fit is found to be an adequate representation of 
the data. For the total CDFS sample we obtained a fully ac- 
ceptable value of x 1 Idof - 6.2/4. The best fit correlation 
length is ro = 8.6 +1.2 h~ x Mpc. The slope of the correla- 
tion, y - 1.33 + 0.11, is flatter than that commonly observed 
for optically selected AGN and galaxies (y ~ 1.6 - 1.8, e.g. 
Le Fevre et al. 1996; Croom et al. 2001). Based on the error 
on ro from this two -parameters fit, we conservatively estimate 
the clustering signal to be detected at the ~ lcr level. We veri- 
fied that projected separations above 0.16 hr x Mpc correspond 
to angular separations above 5 arcsec for sources in the consid- 
ered redshift range. Although the FWHM of the Chandra Point 
Spread Function degrades with off-axis angle, it is still smaller 
than this value within 8 arcmin from the center of the field, 
where ~ 90% of our X-ray sources reside. Therefore, at the 
considered projected scales we do not expect any strong bias 
against pairs with small angular separations, which may arti- 
ficially flatten the observed correlation slope. In addition we 
checked if there is any bias against close pairs because e.g. of 
the constraints on the slit positioning on the masks used for op- 
tical spectroscopy. At any given separation we then computed 



the ratio between the number of pairs in which both sources 
have robust spectroscopic redshift and the total number of pairs 
at the same angular separation. In fact, this ratio is rather con- 
stant, decreasing by only ~ 25% at our smallest angular scales 
below ~ 20 arcsec: this has some effects only at the smallest 
r p bins (at z = 0.7, the median redshift of our sample, 20 arc- 
sec correspond to ~ 0.17 h Mpc) where the clustering signal 
has large uncertainties. Therefore no significant effects on the 
overall best fit y value are expected. The projected correlation 
function of the total CDFS sample is shown in Fig. 1101 

We checked how much these results depend on the choice 
of the random control sample. In particular we have relaxed 
the assumption of placing the random sources at the coordi- 
nates of the real sources, which might remove some signal due 
to angular clustering. As said above it is not appropriate to ran- 
domly distribute the control sources in the full field of view, 
since i) the X-ray sensitivity decreases from the center to the 
outskirts of the field, and ii) the masks used for optical spec- 
troscopy have been placed preferentially in the center of the 
field. As a first check we limited our analysis to the 110 sources 
within a circle with a radius of 8 arcmin from the center, where 
the optical coverage is highest and the X-ray exposure map is 
constant within ~ 20% across most of the field, with the ex- 
ception a few narrow stripes with lower sensitivity due to the 
gaps among ACIS-I CCDs (see e.g. Fig. 3 of Giacconi et al. 
2002). Accordingly, the sources of the control sample were 
randomly placed within this 8 arcmin circle. The best fit cor- 
relation length and slope measured for this CDFS subsample 
were found to be ro = 9.0 ± 1.1 h~ l Mpc and y = 1.38 + 0.14, in 
excellent agreement with the previously quoted values. We can 
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therefore estimate that the suppression in the clustering ampli- 
tude produced by the use of the real coordinates is only of the 
order of a few percent. 

As a further, more refined, check we created a probability 
distribution map for the random sources, where the probabil- 
ity of finding a source at a given position is proportional to 
the number of real sources with measured redshift around that 
position. The map was obtained by repeatedly smoothing the 
distribution of real sources on the sky with a 20 arcsec boxcar 
(5 iterations). Random sources were then placed in the field ac- 
cording to the created probability map. This approach has the 
advantage of fully accounting for observational biases, avoid- 
ing at the same time the removal of angular clustering from the 
measured signal. Even in this case we found a high correlation 
length and a flat slope (r = 9.1 ± 1.0A 1 Mpc; y = 1.36 + 0.10), 
in agreement with the above derived values. In the light of these 
checks, in the following we will then simply place the ran- 
dom sources at the coordinates of the real sources, considering 
for each AGN or galaxy subsample only the positions of the 
sources in that subsample. 

Prompted by previous claims (Yang et al. 2003 1, we 
checked if there is any difference in the clustering proper- 
ties of soft and hard X-ray selected sources. The best fit pa- 
rameters obtained for the 109 soft X-ray selected sources are 
r = 7.5 + 1.4 h~ x Mpc and y = 1.34 ± 0.14, while for the 
97 hard selected sources we obtained ro = 8.8 ± 2.3 hr l Mpc 
and y — 1.28 ± 0.14. Since the correlation length and slope are 
correlated, and large uncertainties arise from the limited size of 
the samples, we fixed y to a common value to best evaluate any 
possible difference in the clustering amplitude. When fixing y 
to 1 .3, we found ro = 7.5 ± 0.6 hr l Mpc for the soft sample and 
ro = 9.1 + 0.8 h~ l Mpc for the hard sample, which therefore 
appears to be only marginally more clustered. 3 

We then considered only the 97 sources classified as AGN 
finding best fit values (r = 10.3 + 1.7 h~ l Mpc, y = 1.33 + 0.14) 
similar to those observed in the total sample (as it could be ex- 
pected since AGN represent the vast majority of the identified 
sources). The AGN correlation function is shown in Fig. ITT1 
Furthermore, we separated the total AGN sample into type 1 
and type 2 AGN (45 and 52 objects, respectively) according 
to the classification diagram of Section 3, without finding sig- 
nificant differences in their clustering properties (see Table 2). 
Because of the low statistics (only 27 objects) we cannot put 
significant constraints to the galaxy correlation function. 

Given the large errors introduced by low statistics, we fixed 
the slope of the correlation function to y = 1.4 to search for 
any difference in the ro values among different populations. 
The adopted value is consistent with the average slopes mea- 
sured in the CDFS and in the CDFN. As expected, the ro val- 
ues measured for the various subsamples agree with those al- 
ready obtained by assuming y as a free parameter, but have 
smaller errors. A summary of the measurements performed in 
this Section is given in Table 2. We finally checked our results 
by fixing the slope of the correlation to y = 1.8 which is the 
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Fig. 12. Projected correlation functions for AGN (circles) and 
galaxies (triangles) in the CDFN. Errors are l<x Poisson con- 
fidence intervals. The best fit power laws are shown as dashed 
lines. 



value commonly observed in galaxy samples at low redshifts 
(Davis & Peebles Carlberg et al. l2TjQTjl i: while the fit is 

significantly worse, the best fit ro values increase by only 15%. 

4.2.2. CDFN 

Most of the considerations made for the CDFS sample are also 
valid for the CDFN sample. In particular a similar uneven dis- 
tribution on the field of the identified sources can be noticed in 
Fig-El so we kept placing the sources of the random sample at 
the coordinates of the real sources. 

We first measured the correlation length for all the CDFN 
sources excluding from our sample only objects with Lo.5-10 < 
10 40 erg s _1 (i.e. possible ULX), leaving a final sample of 240 
sources. Although no detailed information on the fraction of 
extended sources is given in Alexander et al. (2003 ), the de- 
tection procedure adopted for the 2Msec CDFN data should be 



3 For consistency with the other subsamples considered in this pa- 
per, we quote in Table 2 the ro values obtained by fixing the slope to 
7 = 1.4 rather than to 7 = 1.3. Results are essentially unchanged. 



optimized for point-like sources. In Alexander et al. (2003 1 it 
is indeed mentioned that only a few sources are likely to be re- 
ally extended; their presence in the considered sample should 
therefore not affect significantly our results. 

We used again the redshift range z = - 4 since only two 
sources are beyond z = 4. The best fit parameters of the cor- 
relation function, measured at a median redshift z ~ 0.8 are 
r = 4.2 + 0.4 ft -1 Mpc and y = 1.42 ± 0.07. Based on the 
error on ro, the clustering signal is then detected at the ~ 10o" 
level. While the slope is similar to that found in the CDFS, the 
clustering amplitude is significantly smaller. The projected cor- 
relation function of the total CDFN sample is shown in Fig. 1101 
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Sample 


N 


z 


logLo.5-10 




ro 




r 


r (y= 1.4) 










[A" 1 


Mpc] 






[/r 1 Mpc] 


lMsec CDFS 


Total 


124 


0.73 


43.0 


8.6 


± 1.2 


1.33 


±0.11 


9.1 ± 0.6 


Soft X-ray selected 


109 


0.73 


43.0 


7.5 


± 1.4 


1.34 


±0.14 


7.6 ±0.7 


Hard X-ray selected 


97 


0.75 


43.3 


8.8 


±2.2 


1.28 


±0.14 


9.8 ±0.8 


AGN 


97 


0.84 


43.2 


10.3 


± 1.7 


1.33 


±0.14 


10.4 ± 0.8 


type 1 


45 


1.03 


43.6 


9.1 


±3.3 


1.46 


±0.33 


10.1 + i| 


type 2 


52 


0.73 


42.8 


10.5 


± 2.2 


1.40 


±0.21 




galaxies 


27 


0.44 


41.0 












2Msec CDFN 


Total 


240 


0.84 


42.4 


4.2 


±0.4 


1.42 


±0.07 


4.1 ± 0.2 


Soft X-ray selected 


228 


0.84 


42.5 


4.0 


±0.4 


1.42 


±0.08 


4.1 ±0.3 


Hard X-ray selected 


149 


0.90 


43.0 


5.2 


± 1.0 


1.36 


±0.13 


5.0 ±0.5 


AGN 


160 


0.96 


43.0 


5.5 


±0.6 


1.50 


±0.12 


J - -0.5 


type 1 


89 


1.02 


43.5 


6.5 


±0.8 


1.89 


±0.23 


5 6 +0 - 8 


type 2 


71 


0.87 


42.7 


5.1 


± 1.3 


1.52 


±0.27 


4 7+0.8 
^•'-1.0 


galaxies 


80 


0.45 


41.3 


4.0 


±0.7 


1.36 


±0.15 


A 4+0.2 



Table 2. Clustering measurements for different CDFS and CDFN subsamples. Errors are lcr Poisson confidence levels. The 
redshift range z — - 4 was considered for all the above samples except for CDFN galaxies, where we used z = - 1.5. The 
considered sample, number of objects in each sample and their median redshift and luminosity are listed in Columns 1, 2, 3 and 
4, respectively. The best fit correlation length and slope are quoted in Columns 5 and 6. The best fit correlation length obtained 
by fixing the slope to y — 1 .4 is quoted in Column 7. 
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Fig. 13. Projected correlation functions for type 1 AGN (filled 
circles) and type 2 AGN (open circles) in the CDFN. Errors are 
lcr Poisson confidence intervals. The best fit power laws are 
shown as dashed lines. 



where it is also compared with that obtained for the total CDFS 
sample. 

Also in the CDFN we verified that the results do not change 
significantly when limiting the calculation to the sources within 
8 arcmin from the center (80% of the full sample) and placing 
the control sources randomly within this area. Also in this field 



the clustering properties of various subsamples are consistent 
with each other like for example those of soft and hard X-ray 
selected sources (228 and 149 objects in the two subsamples, 
respectively), and those of AGN (160 objects) and galaxies (80 
objects). The best fit clustering parameters for the various sam- 
ples are quoted in Table 2. The projected correlation function 
of CDFN AGN is compared with that of CDFS AGN in Fig.lTTl 
and with that of CDFN galaxies in Fig. 1121 

As mentioned in Section 4. 1, the average redshift of CDFN 
galaxies seems to be higher in the center of the field than in the 
outer regions. By means of a KS test we verified that the red- 
shift distributions of galaxies within and beyond 4 arcmin from 
the center (38 and 42 objects, respectively) differ at > 3.5cr 
level. To check the possible effects on the measured correlation 
function, we generated a first random sample by only consider- 
ing the positions and redshift distribution of the inner sources 
and a second random sample by considering only the redshifts 
and coordinates of the outer sources, and we finally pasted the 
two samples into one. In this way, the outer sources of the 
random sample have on average lower redshifts than the inner 
sources, as observed in the real sample. The galaxy correlation 
function measured using this refined random sample is found 
to be in excellent agreement with the previous measurement. 

Finally, we searched for any possible difference in the 
clustering properties of type 1 AGN (89 objects) and type 2 
AGN (71 objects). Although type Is seem to have a higher 
best fit correlation length and a steeper slope than type 2s 
(r = 6.5 + 0.8/1-' Mpc and y = 1.89 + 0.23 vs r = 5.1 ± 1.3/z 1 
Mpc and y — 1.52 ± 0.27), the two subsamples agree within the 
errors (Fig. II 31. 

Again, we checked our results by fixing y to 1 .4. Although 
the ro have now smaller errors, we did not find any additional 
difference in the clustering properties of the various source 
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populations. Finally, we checked our results by fixing the cor- 
relation slope to y — 1.8 finding that the measured ro values 
increase by ~ 15% as also seen in the CDFS. A summary of the 
measurements performed in this Section is given in Table 2. 



5. Discussion 

5.1. The variance of the clustering amplitude 

The X-ray exposure in the CDFN is twice that in the CDFS. 
It is therefore possible, in principle, that different populations 
with different clustering properties are being sampled in the 
two fields at the respective limiting fluxes. Indeed, as it can 
be easily seen in Table 2, the median luminosity for the to- 
tal source populations of the CDFN is lower than that of the 
CDFS. This effect is primarily due to the raise of the galaxy 
population at very faint X-ray fluxes (see Fig.[2and Section 3). 
The median luminosities for the AGN samples are nonetheless 
very similar in the CDFS and in the CDFN. We performed a 
test by measuring the correlation function only for the CDFN 
sources already detected in the first Msec catalog (Brandt et 
al. |5D01 1, which should guarantee an equal X-ray depth for 
the CDFS and CDFN samples. For the sample of 189 lMsec 
CDFN sources with robust spectroscopic redshift we found es- 
sentially the same correlation length and slope found in the to- 
tal 2Msec CDFN sample. Therefore, the variance in ro between 
the CDFS and the CDFN cannot be ascribed to the different 
depth of the X-ray observations. We note that the redshift selec- 
tion function obtained for the lMsec CDFN is almost identical 
to that obtained for the CDFS. 

Also, as shown in Section 2, no systematic differences ap- 
pear in the follow-up programs of optical spectroscopy, with 
optically faint source being equally observed in both fields. 
As assessed by a KS test, the R magnitude distributions for 
the sources in our two samples (i.e. those with robust redshift 
measurements, Fig.|2ji are indistinguishable, although there is a 
marginal hint that the fraction of sources with R > 24 is slightly 
higher in the CDFS than in the 2Msec CDFN (14 ± 4% and 
9 + 2%, respectively). When considering the R magnitudes of 
the CDFN sources in the lMsec catalog, these are distributed 
as in the CDFS (again checked with a KS test) and the frac- 
tion of faint (R > 24) sources is identical to that of the CDFS. 
Therefore, the variance in the clustering amplitude cannot be 
explained by differences in the optical spectroscopy depth. As 
a final - perhaps redundant - test, it has been directly checked 
that the clustering amplitude in the two fields does not vary 
when considering only sources with R < 24. 

In addition, we checked the R-K colors of our sources. In 
both fields AGN are on average redder than galaxies. Indeed, 
AGN follow galaxy color tracks (see Szokoly et al. 2004 and 
Barger et al. 2003) but lay at higher redshifts than galaxies, 
where galaxy tracks are redder. This can be understood by con- 
sidering that, since the majority of the AGN have low luminosi- 
ties and are in many cases obscured, the optical light is domi- 
nated by the contribution of the host galaxy. When comparing 
the R-K color distribution of the sources in the CDFS and in the 
CDFN we observed a very similar shape. This, combined with 
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Fig. 14. Projected correlation function for the total CDFN sam- 
ple (filled circles) and the CDFS sample obtained by excluding 
sources in the two spikes at z=0.67 and z=0.73 (open circles). 
Errors are lcr Poisson confidence intervals. The best fit power 
laws are shown as dashed lines. 



the uncertainties in the R-K color determination, does not allow 
us to remark any possible difference between the two fields. 

We note that about 1/3 of the identified CDFS sources lay 
within the two prominent spikes at z — 0.67 and z = 0.73. 
In the CDFN, although several redshifts spikes are observed, 
there are no such prominent structures. The two most populated 
spikes in the CDFN (at z = 0.84 and z = 1.02) indeed contain 
only about 1/8 of the total identified sources. As a check we 
measured the projected correlation function for the total CDFS 
sample excluding the sources in the two redshift spikes at z — 
0.67 and z = 0.73, finding r = 3.8+^ h~ x Mpc and y = 1 .44 ± 
0.37 (ry = 3.6 ± 0.9 h~ l Mpc when fixing y to 1.4) in good 
agreement with the values measured for the total CDFN sample 
(see Fig. 1141 . We can therefore conclude that most of the extra- 
clustering signal in the CDFS is due to these two structures. 
We also verified that in the CDFN the clustering amplitude and 
slope do not change significantly when removing the two most 
populated spikes at z = 0.84 and z = 1 .02. 

We should also investigate if the observed variance might 
be induced by the high spectroscopic incompleteness of the 
CDFN and CDFS samples. When looking at the photometric 
redshifts (e.g. Zheng et al. 2004, Barger et al. 2003), it can be 
easily shown that unidentified objects lay on average at higher 
redshifts than spectroscopically identified objects. The median 
redshift for the unidentified CDFS sources (including photo- 
z and low quality spectro-z) is indeed 1.15 (1.40 when con- 
sidering photo-z only), to be compared with 0.73, the median 
redshift of the sources with high quality spectra. In the CDFN 
the median redshift for unidentified sources is 1 . 17 (1 .23 when 



R. Gilli et al.: Spatial clustering of X-ray selected sources 



considering only photo-z), to be compared with the median 
value of 0.84 for the sources already identified. One of the 
most prominent redshift spikes in the CDFN is at z=1.02 (see 
Fig. 0}, while the most prominent structures in the CDFS are 
at z ~ 0.7. One might then speculate that the CDFN spike 
at z=1.02 is more incomplete than the CDFS spikes. Since at 
Z ~ 0.7 - 1 it is difficult to identify sources with weak opti- 
cal emission lines or sources with absorption line dominated 
spectra, these should be the main population missing from the 
spectroscopic samples. In the CDFS, where the information on 
the optical spectra and classification is fully available, we ver- 
ified that the best fit parameters of the correlation function do 
not vary significantly when excluding from the sample sources 
with absorption line dominated spectra or only weak emission 
lines. Therefore, spectroscopic incompleteness does not seem 
a viable argument to explain the different clustering amplitude 
between the CDFS and CDFN, which is rather due to genuine 
cosmic variance. We note that large field to field variance might 
indicate a strong clustering level, whose "real" amplitude can 
be assessed only with several measurements on independent 
fields. In principle, the likelihood of obtaining a given ro value 
for X-ray selected AGN in deep-pencil beam surveys could 
be estimated by sampling several times a cosmological vol- 
ume obtained from N-body simulations, like e.g. the "Hubble 
Volume Simulations" by the Virgo Consortium (see Frenk et 
al. 2000 and references therein). Unfortunately, this method re- 
quires several assumptions on AGN formation and evolution 
within dark matter halos and needs careful and extensive tests 
to evaluate all the possible effects on the clustering amplitude 
of the considered objects. Such an analysis is beyond the scope 
of this paper and will be the subject of future work. 



An easier task, instead, is to see if the reported differences 
in the number counts of the Chandra Deep Fields (e.g. Yang et 
al. 2003 Bauer et al. 2004) are consistent with the fluctuations 
produced by the correlation lengths ro = 5 - 10h~ l Mpc that we 
measure. Very recently Bauer et al. {2004 1 have revisited the 
logN-logS relations in the CDFS and CDFN finding general 
agreement between the two fields, the maximum discrepancy 
(significant at the ~ 4<x level) being ~ 40% for hard sources at 
the faintest fluxes (Jz-w ~ 4 10~ 15 cgs; see their Fig. 5). Since 
we are considering sources detected at the same limiting flux, 
the difference in the observed surface density corresponds to 
a volume density difference of the same entity. The expected 
cosmic variance in a given volume as a function of the am- 
plitude and slope of the correlation function can be estimated 
using Eq. 3 of Somerville et al. ( 2004 1, which is a rearrange- 
ment of Eq. 60.3 by Peebles ( 1980). Within comoving effec- 
tive volumes as those surveyed by each Chandra Deep Field 
(~ 2 10 5 /i~ 3 Mpc 3 ) and for a correlation slope y — 1.4, the ex- 
pected cosmic variance is 30% and 50% for ro = 5 A Mpc and 
ro = 10 /i 1 Mpc, respectively. Therefore, we conclude that the 
reported differences in the number counts between the CDFS 
and the CDFN are fully consistent with the correlation lengths 
measured in this paper. 
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5.2. Comparison with clustering of other X-ray 
samples 

Despite several efforts in the past years, only recently it has 
been possible to directly measure the spatial clustering of X- 
ray selected AGN. Carrera et al. ( 1998 1 found only a 2cr detec- 
tion in the ROSAT International X-ray Optical Survey (RIXOS, 
Mason et al. 2000) on scales < 40 - 80 h~ l Mpc. Interestingly, 
the 2cr signal detected in the RIXOS refers to the subsam- 
ple of sources in the redshift range 0.5-1.0, where the biggest 
structures in the CDFN and CDFS are also detected. The lack 
of clustering signal at z < 0.5 and z > 1 might be due to 
the small volume sampled and to the falling sensitivity of the 
RIXOS, respectively. More recently, Mullis et al. (2004) have 
measured the spatial correlation function of soft X-ray selected 
AGN in the ROSAT NEP survey (their clustering detection is 
at the ~ Act level). Using the same cosmology adopted here, 
they found a correlation length of ro ~ 7.4 ±1.8 hr x Mpc (y 
fixed to 1.8) for source pairs at a median redshift I = 0.22 
and in the scale range 5-60 h~ x Mpc. Also, when account- 
ing for the different cosmology adopted here, the correlation 
length of the RASS sources at a median redshift z = 0.15, mea- 
sured by Akylas et al. (2000) through angular clustering and 
Limber's equation, should be increased to ro = 6.6 +1.6 h' 1 
Mpc 4 . The correlation lengths measured at lower redshifts in 
the NEP and RASS surveys are intermediate values between 
those observed in the CDFS and in the CDFN. We stress that 
the comparison between the Chandra Msec surveys and the 
NEP and RASS survey should be done with the due care since 
they are sampling different luminosity regimes, and AGN clus- 
tering is expected to be a function of luminosity if this corre- 
lates with the mass of the dark halo in which the AGN resides 
(e.g. Kauffmann & Haenelt 2002). The median 0.5-10 keV lu- 
minosity of the NEP AGN (converted from the 0.5-2 keV lu- 
minosity by assuming a spectrum with photon index 2) is in- 
deed logLo.5-10 = 44.4, i.e. ~ 20 times higher than the median 
luminosity in the CDFS and CDFN. The above consideration 
remarks how the Chandra Msec surveys are sampling a pop- 
ulation of AGN with rather low luminosities, for which no in- 
formation on clustering at z ~ 1 was available so far. Another 
possible warning is that we are comparing the soft X-ray se- 
lected AGN in the NEP and in the RASS with the CDFS and 
CDFN AGN, which were selected both in the soft and hard 
band. However we did not observe any significant difference in 
the clustering properties of soft and hard X-ray selected AGN 
within each field. In Fig. ^] we show the correlation length 
of X-ray selected AGN in the above mentioned surveys as a 
function of redshift. Due to the variance in ro measured in the 
CDFS and CDFN, no conclusion can be drawn on the evolution 
(if any) of the clustering amplitude with redshift. 



4 At z = 0.15 the average comoving separations in the A domi- 
nated cosmology adopted here are larger by ~ 10% with respect to the 
Einstein - De Sitter cosmology adopted by Akylas et al. (2000). 
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Fig. 15. Correlation length ro as a function of redshift for dif- 
ferent samples of X-ray selected AGN. From the lowest to the 
highest redshift: RASS (Akylas et al. 2000); NEP (Mullis et al. 
2004 1; CDFS and CDFN (this work). 



5.3. Comparison with clustering of optically selected 
QSOs 

The best constraints on the clustering of optically selected 
QSOs have been derived from the 2dF QSO Redshift Survey 
(2QZ, Croom et al. 2001). Based on a sample of > 10 4 ob- 



jects Croom et al. (2001 1 measured a QSO correlation length 
and slope of r = 5.7 ± 0.5 h~ x Mpc and y = 1.56 ± 0.10 at a 
median redshift of z — 1.5 and on scales 1-60 h Mpc co- 
moving (using the same cosmology adopted here). In addition, 
thanks to the large number of QSOs in their sample, Croom 
et al. (2001 1 were also able to investigate the QSO clustering 
in different redshift slices. The correlation length measured in 
their two lowest bins, at a median redshift comparable with that 
of CDFS and CDFN AGN, is of the order of r = 4.7 ± 0.9 h~ l 
Mpc (for a fixed slope of y — 1 .56), which is comparable with 
the correlation length measured for the CDFN AGN. Again, a 
fully meaningful comparison is hampered by the different lu- 
minosity regimes sampled by the 2QZ and the Chandra Msec 
surveys. Assuming a standard QSO SED (Elvis et al. 1994), 
the characteristic absolute magnitude of 2QZ QSOs at z — 0.9, 
Mi,. ~ -24.15 (derived from the 2QZ luminosity function of 
Croom et al. 2004), can be converted into an X-ray luminosity 
of logLo.5-io = 44.7, well above the average values of CDFN 
and CDFS AGN. In the local Universe the clustering of optical 
QSO has been recently measured by Grazian et al. (2004i by 
means of the Asiago-ESO/RASS QSO survey (AERQS) which 
selects the most rare and luminous objects with B < 15 mag. 
These Authors measured a rather high correlation length of 
ro = 8.6 + 2.0 hr x Mpc at a median redshift of ~ 0.1 and 



on scales 1-30 hr x Mpc comoving (again for a fixed slope 
of y — 1.56). The average 0.5-10 keV luminosity of their 
QSO sample can be estimated to be logLo.5-io = 44.4. The 
AERQS and the 2QZ data have been compared with QSO clus- 
tering evolution models (Matarrese et al. 119971 Moscardini et 
al. 1998 ) based on the Press-Schechter formalism for the evolu- 
tion of the dark matter halo mass function. In fact, Grazian et al. 
(2004) and Croom et al. ( 2001 » derive a minimum mass for the 
dark matter halos where QSO reside of M DM h ~ 10 13 \-C l M e . 
Due to the present large uncertainties it is not yet possible to 
put significant constraints to clustering evolution models with 
X-ray selected AGN. We just note here that clustering of X-ray 
AGN is consistent with models with M DMH ~ 10 13 h~ x M Q if 
the low ro value measured in the CDFN is typical at z ~ 1 . On 
the other hand, if the ro value measured in the CDFS has to be 
considered as typical, then Momh can be as high as 10 14 h~ l 



5.4. Comparison with galaxy clustering 

Gilli et al. ( 1217011 found that about 70 - 80% of the high signifi- 
cance peaks seen in the redshift distribution of K-band selected 
sources in a sub-area of the CDFS (the area covered by the 
K20 survey, see Fig. [71 Cimatti et al. 2002), have a correspond- 
ing peak in the X-rays. This implies that X-ray and K-band se- 
lected sources are tracing the same underlying structures. Also, 
it might be speculated from these samples that AGN cluster- 
ing is similar to that of early type galaxies, whose detection 
rate is higher in K-band rather than in optically selected sam- 
ples. The measurements of the spatial correlation function for 
the AGN in the CDFS seem to be in agreement with this idea, 
since the measured AGN correlation length is found to be sim- 
ilar to that of Extremely Red Objects with R - K > 5 (EROs) 
at z ~ 1 , which are thought to be the progenitors of early type 
galaxies (Daddi et al. 2001 1. Such a high clustering amplitude 
is however not observed for the AGN in the 2Msec CDFN, for 
which ro is of the order of 5 — 6/z 1 Mpc. If we then consider the 
AGN correlation length to be in the range 5-10 Mpc, this is still 
consistent with AGN at z ~ 1 to be generally hosted by early 
type galaxies. Indeed Coil et al. (2003) have recently measured 
the correlation length of a sample of ~ 2000 R-band selected 
galaxies at z — 0.7 - 1.25 in the DEEP2 survey. With these 
good statistics they were able to obtain an accurate measure 
of the correlation function of early-type and late-type galaxies 
separately (the latter being more numerous by a factor of ~ 4), 
finding ro = 6.61 + 1.12 hr x Mpc for early type galaxies and 
ro = 3.17 ± 0.54 h Mpc for late type galaxies. Interestingly 
enough, on scales of r p - 0.25 - 8 h~ l Mpc, i.e. very similar to 
those adopted in this paper, the slope of the correlation function 
for early-type galaxies is found to be rather flat, y — 1 .48+0.06, 
in agreement with that measured for the AGN in the CDFS 
and in the CDFN (note however that Guzzo et al. ( 1997 1 found 
y = 2.0 + 0.1 for local early type galaxies). On the contrary 
the correlation slope for late-type galaxies is found to be sig- 
nificantly steeper (y = 1.68 ± 0.07). To summarize, our results 
are consistent with the idea that at z ~ 1 the population of 
AGN with typical X-ray luminosity of 10 43 erg s _1 is preferen- 
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tially hosted by early-type galaxies. However, other deep X-ray 
pointings in separate fields are needed to measure the average 
clustering of X-ray selected AGN and get more stringent re- 
sults. 

6. Conclusions and future work 

We have measured the projected correlation function w(r p ) of 
X-ray selected AGN and galaxies in the 2Msec Chandra Deep 
Field North and in the lMsec Chandra Deep Field South on 
scales ~ 0.2 - 10 h Mpc. A significantly different ampli- 
tude for AGN clustering has been observed in these ~ 0.1 deg 2 
fields, the correlation length ro measured in the CDFS being a 
factor of ~ 2 higher than in the CDFN. The observed difference 
does not seem to be produced by any observational bias, and is 
therefore likely due to cosmic variance. In both fields the slope 
of the correlation function is found to be flat (y ~ 1.3 — 1.5), 
but consistent within the errors with that measured for opti- 
cally selected QSO (Croom et al. 2001 1. The extra correlation 
signal present in the CDFS is primarily due to the two promi- 
nent spikes at z — 0.67 and z = 0.73 containing about 1/3 of the 
identified sources. Indeed, although significant redshifts spikes 
are also observed in the CDFN, they are less prominent than 
those observed in the CDFS. In the CDFN we were also able 
to measure the clustering properties of X-ray selected galax- 
ies, which have been found to be similar to those of AGN in 
the same field. Finally, within each field, we did not find sig- 
nificant differences between the clustering properties of hard 
X-ray selected and soft X-ray selected sources, or, similarly, 
between type-1 and type-2 AGN. 

Significant improvements in the measurements of the AGN 
spatial correlation function and then in the understanding of the 
large scale structures in the X-ray sky is expected from the on 
going observations of the Extended Chandra Deep Field South 
(E-CDFS, PI N. Brandt) and of the COSMOS-XMM field (PI 
G. Hasinger). The E-CDFS is a deep-and-wide survey con- 
sisting of 4 Chandra 250 ksec ACIS-I pointings arranged in 
a square centered on the Msec CDFS. The final covered area 
will be ~ 0.3 deg 2 , i.e. a factor of 3 higher than that covered by 
the Msec CDFS, with average sensitivities of 1 10~ 16 erg cm -2 
s _1 in the soft band and 1 10" 15 erg cirT 2 s in the hard band. 
This will allow to significantly enlarge the sample and reduce 
statistical uncertainties introduced by the small CDFS field of 
view in the measurements of the clustering of Seyfert-like AGN 
with average logLo.5-10 = 43 erg s . A detailed study of clus- 
tering of high-luminosity X-ray selected AGN will be instead 
performed by the wide area COSMOS-XMM survey, consist- 
ing of a mosaic of 25 XMM short pointings (32 ksec each) cov- 
ering a total 2.2 deg 2 field with a sensitivity of 1 10~ 15 erg cm" 2 
s _1 in the soft band and 4 10" 15 erg cirT 2 s in the hard band. 
The two projects are complementary and should constrain the 
clustering properties of X-ray selected AGN as a function of 
redshift and luminosity. 
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